Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization

نویسندگان

  • Jian Zhang
  • Rong Jin
  • Yiming Yang
  • Alexander G. Hauptmann
چکیده

Logistic Regression (LR) has been widely used in statistics for many years, and has received extensive study in machine learning community recently due to its close relations to Support Vector Machines (SVM) and AdaBoost. In this paper, we use a modified version of LR to approximate the optimization of SVM by a sequence of unconstrained optimization problems. We prove that our approximation will converge to SVM, and propose an iterative algorithm called “MLRCG” which uses Conjugate Gradient as its inner loop. Multiclass version “MMLR-CG” is also obtained after simple modifications. We compare the MLR-CG with SVM over different text categorization collections, and show that our algorithm is much more efficient than SVM when the number of training examples is very large. Results of the multiclass version MMLR-CG is also reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Least Squares Support Vector Machine for Constitutive Modeling of Clay

Constitutive modeling of clay is an important research in geotechnical engineering. It is difficult to use precise mathematical expressions to approximate stress-strain relationship of clay. Artificial neural network (ANN) and support vector machine (SVM) have been successfully used in constitutive modeling of clay. However, generalization ability of ANN has some limitations, and application of...

متن کامل

Large-Scale Bayesian Logistic Regression for Text Categorization

Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of doc...

متن کامل

Trust Region Newton Method for Large-Scale Logistic Regression

Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

A sparse version of the ridge logistic regression for large-scale text categorization

The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with the main advantage of computing a probability value rather than a score. However, the dense solution of the ridge makes its use unpractical for large scale categorization. On the other side, LASSO regularization is ab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003